Dealing with Spatial Data using R

Alessia Calafiore

Overview

  • Introducing R and Spatial Data

  • Mapping Gender Inequality in Scotland

R and Spatial Data

Simple Features for R (the sf package)

From Edzer Pebesma vignette:

  • sf implements a widely used standard in spatial databases that describes how objects in the real world can be represented in computers;
  • A feature can be thought of as a thing, or an object in the real world, such as a building or a tree.
  • Features have a geometry describing where on Earth the feature is located, and they have attributes, which describe other properties.

  • The most common simple features for spatial analysis are POINTS, LINESTRING, POLYGONS

Once installed sf can be loaded as any other R library

library(sf)

Now let’s load some spatial data

boundaries <- st_read("scottish_wards.gpkg")
Reading layer `all_scotland_wards_4th' from data source 
  `C:\Users\Alessia\Documents\rLadiesSpatial\scottish_wards.gpkg' 
  using driver `GPKG'
Simple feature collection with 353 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 5513 ymin: 530249.9 xmax: 470327.5 ymax: 1220301
Projected CRS: OSGB36 / British National Grid
class(boundaries)
[1] "sf"         "data.frame"

Coordinate Reference Systems

Coordinates correspond to specific location on the Earth’s surface on the basis of a coordinate reference system (CRS). A common way to identify a CRS is through its EPSG code.

st_crs(boundaries)[[1]] #EPSG = 27700
[1] "OSGB36 / British National Grid"

CRS can be transformed with sf.

boundaries_WGS84 <- st_transform(boundaries, 4326)
st_crs(boundaries_WGS84)[[1]]
[1] "EPSG:4326"

Geometry operations (some)

On a single object

  • st_centroids()

  • st_buffer()

  • st_simplify()

  • st_union()

Between two objects

  • st_intersection()

  • st_difference()

  • st_union()

Spatial Join

Spatial joins are useful when you want to join attributes of one sf obj with an other object based on some geometry operation.

pts <- st_sample(boundaries[1:2, ], 10) # creates sample points within boundaries
pts <- st_as_sf(pts) #transform it into an sf object 
names(pts)
[1] "x"
names(boundaries[1:2, ])
[1] "CODE"     "Ward_No"  "Review"   "Council"  "ONS_2010" "Name"     "geom"    

As you can see there is no attribute we could use to join these two datasets.

but can we do it spatially?

Let’s try then

pts_data<-st_join(pts, boundaries[1:2,])
head(pts_data)
Simple feature collection with 6 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 384919.8 ymin: 805037 xmax: 394237.3 ymax: 813326.5
Projected CRS: OSGB36 / British National Grid
    CODE Ward_No     Review       Council  ONS_2010
1 00QAMA       1 4th review Aberdeen City S13002476
2 00QAMA       1 4th review Aberdeen City S13002476
3 00QAMA       1 4th review Aberdeen City S13002476
4 00QAMA       1 4th review Aberdeen City S13002476
5 00QAMM      12 4th review Aberdeen City S13002487
6 00QAMM      12 4th review Aberdeen City S13002487
                          Name                  geometry
1 Dyce / Bucksburn / Danestone POINT (386012.5 809104.4)
2 Dyce / Bucksburn / Danestone POINT (384919.8 813326.5)
3 Dyce / Bucksburn / Danestone     POINT (386888 811398)
4 Dyce / Bucksburn / Danestone POINT (386363.9 810260.3)
5            Torry / Ferryhill   POINT (394096.6 805037)
6            Torry / Ferryhill POINT (394237.3 805274.5)

Mapping Gender Inequality in Scotland

Gender Pay Gap and Gender Inequality

  • Broadly, the Gender Pay Gap is the difference in pay between women and men.

  • Globally, women on average, are paid about 20 per cent less than men (ILO).

  • Gender inequality in job opportunities and career.

Measuring Gender Inequality in Scotland

Socio-economic Classification (SeC - Scottish Census 2011)

C1 - Higher managerial, administrative and professional occupations

C2 - Lower managerial, administrative and professional occupations

C3 - Intermediate occupations

C4 - Small employers and own account workers

C5 - Lower supervisory and technical occupations

C6 - Semi-routine occupations

C7 - Routine occupations

C8 - Never worked and long-term unemployed

Let’s load some data

data <- read.csv("LC6121SC.csv")
names(data)
 [1] "Name"   "Gender" "AllPpl" "C1"     "C2"     "C3"     "C4"     "C5"    
 [9] "C6"     "C7"     "C8"    

With these data let’s create a simple equality score (E) to compare the likelihood of belonging to each of these SeC between females and males.

A bit of data wrangling

library(tidyverse)

ward_data <- data %>%
  #filter out country level data
  filter(Name != "Scotland") %>%
  #reshape the data
  pivot_wider(id_cols = Name,
              names_from = Gender,
              values_from = starts_with("C")) %>%
  #compute the score
  mutate(
    C1_gap = ((C1_Females / C1_All) / (C1_Males / C1_All)) * 100,
    C2_gap = ((C2_Females / C2_All) / (C2_Males / C2_All)) * 100,
    C3_gap = ((C3_Females / C3_All) / (C3_Males / C3_All)) * 100,
    C4_gap = ((C4_Females / C4_All) / (C4_Males / C4_All)) * 100,
    C5_gap = ((C5_Females / C5_All) / (C5_Males / C5_All)) * 100,
    C6_gap = ((C6_Females / C6_All) / (C6_Males / C6_All)) * 100,
    C7_gap = ((C7_Females / C7_All) / (C7_Males / C7_All)) * 100,
    C8_gap = ((C8_Females / C8_All) / (C8_Males / C8_All)) * 100
) %>%
select(Name, contains("gap"))

How to interpret the score (E)

  • If E = 100 females and males have the same likelihood to be in a SeC;

  • If E > 100 it is more likely for females than males to be in a SeC;

  • if E < 100 it is more likely for males than females to be in a SeC;

Let’s get some descriptive stats

ward_data %>%
  pivot_longer(cols = contains("gap"),
               names_to = "SeC",
               values_to = "Gap") %>%
  group_by(SeC) %>%
  summarise(
    Min = min(Gap),
    Max = max(Gap),
    Mean = mean(Gap),
    SD = sd(Gap)
  )
# A tibble: 8 × 5
  SeC      Min   Max  Mean    SD
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 C1_gap  37.0  84.9  55.1  9.22
2 C2_gap  81.3 177.  134.  14.2 
3 C3_gap  66.8 510.  287.  57.4 
4 C4_gap  22.2  74.0  43.8  9.55
5 C5_gap  23.8  64.9  38.2  5.90
6 C6_gap  96.0 292.  182.  27.6 
7 C7_gap  34.6 113.   68.2 10.9 
8 C8_gap  45.4 233.  110.  23.9 

Static maps with ggplot2

  • ggplot is a widely used R library, now over 10 years old
  • It is used to make any type of plot, including maps !!!

Let’s make some maps !!!

Now we can make our first map

ggplot()+
  geom_sf(aes(fill = C1_gap),data = ward_data_geo)

Let’s make some changes

ggplot()+
  geom_sf(aes(fill = C1_gap),
          data = ward_data_geo,
          colour = "NA") +
  #change color palette
  scale_fill_gradientn( 
            colors = c("#0002A1", "#332FD0", "#FB2576", "#3F0071")
                ) +
  #add a scale bar
  ggspatial::annotation_scale( 
    location = "bl",
    bar_cols = c("grey60", "white")
    ) +
  #change text 
  labs(fill = "Women every 100 Men\nin Higher Managerial Roles",
       title = "Gender Inequality in Scotland")+
  #customize theme
  theme_void() + 
  theme(
    text = element_text(family = "Futura-Medium",color = "#22211d"),
    legend.title = element_text(family = "Futura-Bold", size = 10),
    legend.text = element_text(family = "Futura-Medium", size = 10),
    plot.background = element_rect(fill = "#f5f5f2", color = NA),
    panel.background = element_rect(fill = "#f5f5f2", color = NA)
    )  

ggplot()+
  geom_sf(aes(fill = C1_gap),
          data = ward_data_geo,
          colour = "NA") +
  #change color palette
  scale_fill_gradientn( 
            colors = c("#0002A1", "#332FD0", "#FB2576", "#3F0071")
                ) +
  #add a scale bar
  ggspatial::annotation_scale(  
    location = "bl",
    bar_cols = c("grey60", "white")
    ) +
  #change text 
  labs(fill = "Women every 100 Men\nin Higher Managerial Roles",
       title = "Gender Inequality in Scotland")+
  #customize theme
  theme_void() + 
  theme(
    text = element_text(family = "Futura-Medium",color = "#22211d"),
    legend.title = element_text(family = "Futura-Bold", size = 10),
    legend.text = element_text(family = "Futura-Medium", size = 10),
    plot.background = element_rect(fill = "#f5f5f2", color = NA),
    panel.background = element_rect(fill = "#f5f5f2", color = NA)
    )  

ggplot()+
  geom_sf(aes(fill = C1_gap),
          data = ward_data_geo,
          colour = "NA") +
  #change color palette
  scale_fill_gradientn( 
            colors = c("#0002A1", "#332FD0", "#FB2576", "#3F0071")
                ) +
  #add a scale bar
  ggspatial::annotation_scale(  
    location = "bl",
    bar_cols = c("grey60", "white")
    ) +
  #change text 
  labs(fill = "Women every 100 Men\nin Higher Managerial Roles",
       title = "Gender Inequality in Scotland")+
  #customize theme
  theme_void() + 
  theme(
    text = element_text(family = "Futura-Medium",color = "#22211d"),
    legend.title = element_text(family = "Futura-Bold", size = 10),
    legend.text = element_text(family = "Futura-Medium", size = 10),
    plot.background = element_rect(fill = "#f5f5f2", color = NA),
    panel.background = element_rect(fill = "#f5f5f2", color = NA)
    )  

and see the result

Focus on Edinburgh

Preparing data to add ward labels

#create polygon centroids to be used as location for the labels
label_poi <- ward_data_geo %>%
  filter(Council %in% c("City of Edinburgh")) %>%
  st_centroid()

#add long lat field
label_poi <- label_poi %>%
  mutate(LONG=as.numeric(st_coordinates(label_poi)[,1]),
         LAT=as.numeric(st_coordinates(label_poi)[,2]))

names(label_poi)
 [1] "CODE"     "Ward_No"  "Review"   "Council"  "ONS_2010" "Name"    
 [7] "C1_gap"   "C2_gap"   "C3_gap"   "C4_gap"   "C5_gap"   "C6_gap"  
[13] "C7_gap"   "C8_gap"   "geom"     "LONG"     "LAT"     

Let’s add the labels to the map

#filter the data
ward_data_geo %>%
  filter(Council %in% c("City of Edinburgh")) %>%
  ggplot() +
  geom_sf(aes(fill = C1_gap),
          colour = "#f5f5f2") +
  scale_fill_gradientn(
            colors = c("#0002A1", "#332FD0", "#FB2576", "#3F0071")
                ) +
  theme_void() +
  ggspatial::annotation_scale(
    location = "br",
    bar_cols = c("grey60", "white")
    ) +
  labs(fill = "Women for every 100 Men\nin Higher Managerial Roles",
       title = "Gender Inequality in Edinburgh")+
  #customize theme
  theme(
    text = element_text(family = "Futura-Medium",color = "#22211d"),
    legend.title = element_text(family = "Futura-Bold", size = 10),
    legend.text = element_text(family = "Futura-Medium", size = 10),
    plot.background = element_rect(fill = "#f5f5f2", color = NA),
    panel.background = element_rect(fill = "#f5f5f2", color = NA)
    )  +
  #add labels
  geom_sf(data = label_poi, color="NA")+
  ggrepel::geom_text_repel(mapping = aes(x=LONG,
                                         y=LAT,
                                         label=Name), 
                           data = label_poi,
                           size=2,
                           color = "white",     # text color
                           bg.color = "grey30", # shadow color
                           bg.r = 0.15,
                           max.overlaps=Inf,
                           segment.color = "black")

#filter the data
ward_data_geo %>%
  filter(Council %in% c("City of Edinburgh")) %>%
  ggplot() +
  geom_sf(aes(fill = C1_gap),
          colour = "#f5f5f2") +
  scale_fill_gradientn(
            colors = c("#0002A1", "#332FD0", "#FB2576", "#3F0071")
                ) +
  theme_void() +
  ggspatial::annotation_scale(
    location = "br",
    bar_cols = c("grey60", "white")
    ) +
  labs(fill = "Women for every 100 Men\nin Higher Managerial Roles",
       title = "Gender Inequality in Edinburgh")+
  #customize theme
  theme(
    text = element_text(family = "Futura-Medium",color = "#22211d"),
    legend.title = element_text(family = "Futura-Bold", size = 10),
    legend.text = element_text(family = "Futura-Medium", size = 10),
    plot.background = element_rect(fill = "#f5f5f2", color = NA),
    panel.background = element_rect(fill = "#f5f5f2", color = NA)
    )  +
  #add labels
  geom_sf(data = label_poi, color="NA")+
  ggrepel::geom_text_repel(mapping = aes(x=LONG,
                                         y=LAT,
                                         label=Name), 
                           data = label_poi,
                           size=2,
                           color = "white",     # text color
                           bg.color = "grey30", # shadow color
                           bg.r = 0.15,
                           max.overlaps=Inf,
                           segment.color = "black")

and see the result

Let’s add some interactivity

Interactive maps with leaflet

Leaflet is JavaScript library for mobile-friendly interactive maps.

This R package makes it easy to integrate and control Leaflet maps in R.

After installed you can load it:

library(leaflet)

Some data prep

#transform CRS to make it compatible with global map tiles prividers
ward_data_geo <- st_transform(ward_data_geo, 4326) %>%
  st_make_valid() %>%
  st_simplify()

#find centroid of Scotland to center the first view
scotland_ct <- ward_data_geo %>%
  st_make_valid() %>%
  st_union() %>%
  st_centroid()

Basic example

leaflet() %>%
  setView(scotland_ct[[1]][1], scotland_ct[[1]][2], zoom = 6) %>%
  # add a dark basemap
  addProviderTiles("CartoDB.DarkMatter") %>%
  # add the polygons of the clusters
  addPolygons(
    data = ward_data_geo,
    color = "#E2E2E2",
    # set the opacity of the outline
    opacity = 1,
    # set the stroke width in pixels
    weight = 1,
    # set the fill opacity
    fillOpacity = 0.2
  ) 

leaflet() %>%
  setView(scotland_ct[[1]][1], scotland_ct[[1]][2], zoom = 6) %>%
  # add a dark basemap
  addProviderTiles("CartoDB.DarkMatter") %>%
  # add the polygons of the clusters
  addPolygons(
    data = ward_data_geo,
    color = "#E2E2E2",
    # set the opacity of the outline
    opacity = 1,
    # set the stroke width in pixels
    weight = 1,
    # set the fill opacity
    fillOpacity = 0.2
  ) 

leaflet() %>%
  setView(scotland_ct[[1]][1], scotland_ct[[1]][2], zoom = 6) %>%
  # add a dark basemap
  addProviderTiles("CartoDB.DarkMatter") %>%
  # add the polygons of the clusters
  addPolygons(
    data = ward_data_geo,
    color = "#E2E2E2",
    # set the opacity of the outline
    opacity = 1,
    # set the stroke width in pixels
    weight = 1,
    # set the fill opacity
    fillOpacity = 0.2
  ) 

Let’s see what we get

Add fill and legend

bins <- c(35, 45, 55, 65, 75, 85)
pal <- colorBin(c("#0002A1", "#332FD0", "#FB2576", "#3F0071"), domain = ward_data_geo$C1_gap, bins = bins)
leaflet(ward_data_geo) %>%
  # center the map view
  setView(scotland_ct[[1]][1], scotland_ct[[1]][2], zoom = 6) %>%
  # add a dark basemap
  addProviderTiles("CartoDB.DarkMatter") %>%
  # change polygon fill
  addPolygons(
    # set the opacity of the outline
    opacity = 0,
    # set the stroke width in pixels
    weight = 0.1,
    # set fill colors
    fillColor = ~pal(C1_gap),
    # set the fill opacity
    fillOpacity = 0.8
  ) %>%
  addLegend(pal = pal, 
            values = ~C1_gap, 
            opacity = 0.7,
            position = "bottomright",
            title = "Women every 100 Men</br>in Higher Managerial Roles")

bins <- c(35, 45, 55, 65, 75, 85)
pal <- colorBin(c("#0002A1", "#332FD0", "#FB2576", "#3F0071"), domain = ward_data_geo$C1_gap, bins = bins)
leaflet(ward_data_geo) %>%
  # center the map view
  setView(scotland_ct[[1]][1], scotland_ct[[1]][2], zoom = 6) %>%
  # add a dark basemap
  addProviderTiles("CartoDB.DarkMatter") %>%
  # change polygon fill
  addPolygons(
    # set the opacity of the outline
    opacity = 0,
    # set the stroke width in pixels
    weight = 0.1,
    # set fill colors
    fillColor = ~pal(C1_gap),
    # set the fill opacity
    fillOpacity = 0.8
  ) %>%
  addLegend(pal = pal, 
            values = ~C1_gap, 
            opacity = 0.7,
            position = "bottomright",
            title = "Women every 100 Men</br>in Higher Managerial Roles")

bins <- c(35, 45, 55, 65, 75, 85)
pal <- colorBin(c("#0002A1", "#332FD0", "#FB2576", "#3F0071"), domain = ward_data_geo$C1_gap, bins = bins)
leaflet(ward_data_geo) %>%
  # center the map view
  setView(scotland_ct[[1]][1], scotland_ct[[1]][2], zoom = 6) %>%
  # add a dark basemap
  addProviderTiles("CartoDB.DarkMatter") %>%
  # change polygon fill
  addPolygons(
    # set the opacity of the outline
    opacity = 0,
    # set the stroke width in pixels
    weight = 0.1,
    # set fill colors
    fillColor = ~pal(C1_gap),
    # set the fill opacity
    fillOpacity = 0.8
  ) %>%
  addLegend(pal = pal, 
            values = ~C1_gap, 
            opacity = 0.7,
            position = "bottomright",
            title = "Women every 100 Men</br>in Higher Managerial Roles")

bins <- c(35, 45, 55, 65, 75, 85)
pal <- colorBin(c("#0002A1", "#332FD0", "#FB2576", "#3F0071"), domain = ward_data_geo$C1_gap, bins = bins)
leaflet(ward_data_geo) %>%
  # center the map view
  setView(scotland_ct[[1]][1], scotland_ct[[1]][2], zoom = 6) %>%
  # add a dark basemap
  addProviderTiles("CartoDB.DarkMatter") %>%
  # change polygon fill
  addPolygons(
    # set the opacity of the outline
    opacity = 0,
    # set the stroke width in pixels
    weight = 0.1,
    # set fill colors
    fillColor = ~pal(C1_gap),
    # set the fill opacity
    fillOpacity = 0.8
  ) %>%
  addLegend(pal = pal, 
            values = ~C1_gap, 
            opacity = 0.7,
            position = "bottomright",
            title = "Women every 100 Men</br>in Higher Managerial Roles")

Add popup

bins <- c(35, 45, 55, 65, 75, 85)
pal <- colorBin(c("#0002A1", "#332FD0", "#FB2576", "#3F0071"), domain = ward_data_geo$C1_gap, bins = bins)
leaflet(ward_data_geo) %>%
  # center the map view
  setView(scotland_ct[[1]][1], scotland_ct[[1]][2], zoom = 6) %>%
  # add a dark basemap
  addProviderTiles("CartoDB.DarkMatter") %>%
  # change polygon fill
  addPolygons(
    # set the opacity of the outline
    opacity = 0,
    # set the stroke width in pixels
    weight = 0.1,
    # set fill colors
    fillColor = ~pal(C1_gap),
    # set the fill opacity
    fillOpacity = 0.8,
    # add popup
    popup = paste('<strong>',"Gap: ",'</strong>', round(ward_data_geo$C1_gap), "<br>",
                  '<strong>',"Ward Name:",'<strong>', ward_data_geo$Name, "<br>",
                  '<strong>',"Council: ",'<strong>', ward_data_geo$Council, "<br>")
  ) %>%
  addLegend(pal = pal, 
            values = ~C1_gap, 
            opacity = 0.7,
            position = "bottomright",
            title = "Women every 100 Men</br>in Higher Managerial Roles")

Conclusions

  • These are not research results, but just some quick data exploration to showcase the R magic !
  • More affluent and urban areas seem to have lower differences between women and men.
  • It would be interesting to explore this more as it demonstrate the importance of intersectional feminism perspectives.
  • There are many more ways/libraries that can be used to make maps.
  • Hope you enjoyed the tutorial

As always

Contacts

acalafio@ed.ac.uk

@alel_domi

@alel@datasci.social